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Description 

[0001] The present invention relates to a diagnostic composition comprising nucleic acid molecules which are ca- 
pable of specifically hybridizing to the mRNAs of genes showing abnormal gene expression associated with leukemias. 
5 The present invention also relates to the use of said nucleic acid molecules for diagnosis of a leukemia or leukemia 
subtypes or a disposition to a leukemia. 

[0002] Today leukemias are classified into four different groups: acute myeloid (AML), acute lymphatic (ALL), chronic 
myeloid (CML) and chronic lymphatic leukemia (CLL): Within these groups, several subcategories can be identified 
further using a panel of standard techniques as described below. The incidence of leukemias is increasing with age 

10 and is 5/100.000/year in AML, 1/100.000/year in ALL, 1/100.000 in CML and 6/1 00.000/year in CLL. Several methods 
for classification have to be applied at diagnosis and before treatment starts: cytomorphology and cytochemistry, mul- 
tiparameter-immunophenotyping, cytogenetics including fluorescence in situ hybridization, and molecular techniques 
such as polymerase chain reaction (PCR). So far only a combination of these techniques allows a precise diagnosis 
which is necessary to apply state of the art treatment. As the exact diagnosis is mandatory for example in CML the 

15 detection of a specific cytogenetic abnormality, the translocation (9;22) or its molecular counterpart, the BCR/ABL 
rearrangement is required to establish the diagnosis of CML. While all patients with CML show a BCR-ABL-rearrange- 
ment and are therefore homogenous with regard to the primary genetic abnormality, in AML and ALL at least 10-15 
different subgroups have been identified on the morphological, genetical or molecular level. Also in CLL several sub- 
groups can be clearly separated. These different subcategories in leukemias are associated with varying clinical out- 

20 come and therefore are the basis for different treatment strategies. The importance of highly specific classification may 
be illustrated in detail further for the AML as a very heterogeneous group of diseases. 

[0003] Data from clinical trials showed that outcome of patients with AML differs in a broad range. Several parameters 
influencing prognosis have been identified. These can be assigned to different categories: patients characteristics (i. 
e. age, comorbidity), therapy, and biology of the AML. Therefore, a lot of effort was invested to identify biological entities 

25 and to distinguish subgroups of AML which are associated with a favorable, intermediate or unfavorable prognosis, 
respectively. In order to allow a comparison between different studies a classification of AML was mandatory. In 1 976 
the FAB-classification was proposed by the French-American-British cooperative group which was based on cytomor- 
phology and cytochemistry in order to separate AML subgroups according to the morphological appearance of blasts 
in the blood and bone marrow. In addition, it was recognized that genetic abnormalities occurring in the leukemic blast 

30 had a major impact on the morphological picture and even more on the prognosis. So far, the karyotype of the leukemic 
blasts is the most important independent prognostic factor regarding response to therapy as well as survival. For clinical 
purposes karyotype analysis allows to discriminate between three major prognostic groups. A favorable outcome under 
currently used treatment regimens with cure rates from 50% up to 85% was observed in several studies in patients 
with a) t(8;21) (q22;q22) occuring in AML M2, b) inv(16) (p13q22) occurring in AML M4eo and c) t(15;17) (q22;q11-12) 

35 occurring in AML M3/M3v. In contrast, chromosome aberrations with an unfavorable clinical course are -5/del(5q), 
-7/del (7q), inv(3)/t(3;3) and complex aberrant karyotypes with cure rates of only 10%. The remainder AML patients 
are assigned to a prognostically intermediate group. This latter group is very heterogeneous because it includes patients 
with a normal karyotype as well as those with rare chromosome aberrations with yet unknown prognostic impact. 
[0004] The subclassification of leukemias becomes increasingly important to guide therapy. Thus, the development 

40 of new, specific treatment approaches requires the identification of specific subtypes that may benefit from a distinct 
therapeutic protocol. It has already been shown in two entities that the development of specific drugs can improve 
outcome of distinct subsets of leukemia. One important example is the development of a new therapeutic drug (STI571 ) 
for the treatment of chronic myeloid leukemia (CML): this designed molecule inhibits the CML specific chimeric tyrosine 
kinase BCR-ABL generated from the genetic defect observed in CML, the BCR-ABL-rearrangement due to the trans- 

45 location between chromosomes 9 and 22 (t(9;22)(q34;q11)). First data show that therapy response is dramatically 
higher in patients treated with this new drug as compared to all other drugs that had been used so far. Another example 
is the subtype of acute myeloid leukemia AML M3 and its variant M3v both with karyotype t(15;17)(q22;qll-12). The 
introduction of a new drug (all-trans retinoic acid - ATRA) has improved the outcome in this subgroup of patient from 
about 50% to 85% long-term survivors. As it is mandatory for these patients suffering from these specific leukemia 

50 subtypes to be identified as fast as possible so that the best therapy can be applied, diagnostics today must accomplish ^ » , 

subclassification with maximal precision. Not only for these subtypes but also for several other leukemia subtypes 
different treatment approaches could improve outcome. Therefore, rapid and precise identification of distinct leukemia 
subtypes is the future goal for diagnostics. 

[0005] So far a combination of methods is necessary to obtain the most important information in leukemia diagnostics: 
55 Analysis of the morphology and cytochemistry of bone marrow blasts and peripheral blood cells is necessary to establish 
the diagnosis. In some cases the addition of immunophenotyping is mandatory to separate very undifferentiated AML 
from acute lymphoblastic leukemia and CLL. Leukemia subtypes investigated can be diagnosed by cytomorphology 
alone, if an expert reviews the smears. However, a genetic analysis based on chromosome analysis, fluorescence in 
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situ hybridization or RT-PCR and immunophenotyping is required in order to assign all cases in to the right category. 
The aim of these techniques besides diagnosis is mainly to determine the prognosis of the leukemia. A major disad- 
vantage of these methods, however, is that viable cells are necessary as the cells for genetic analysis have to divide 
in vitro in order to obtain metaphases for the analysis. Another problem is the long time of 72 hours from receipt of the 
5 material in the laboratory to obtain the result. Furthermore, great experience in preparation of chromosomes and even 
more in analyzing the karyotypes is required to obtain the correct result in at least 90% of cases. These experts in their 
field are necessary for all other techniques mentioned above as well. 

[0006] Thus, the technical problem underlying the present invention is to provid means for leukemia diagnostics 
which overcome the disadvantages of the prior art diagnostic methods. 

w [0007] The solution to said technical problem is achieved by providing the embodiments characterized in the claims. 
Based on biomathematical analysis of gene expression profiles a subset of genes could be introduced which builds 
up the basis for designing and developing a novel diagnostic approach based on microarray technology which abolishes 
todays standard procedures in diagnosis of leukemia. These standard diagnostic procedures require more and more 
centralized core facilities with both personal experts in the fields of cytomorphology, cytogenetics and molecular ge- 

15 netics and expensive lab equipment, which causes increasing costs for adaequate diagnosis. The present invention 
provides a novel cost-effective diagnostic tool, which is less time consuming, easy to operate but nevertheless as 
accurate and save as all standard methods combined today. The set of genes allows to assign clinical samples either 
as healthy or malignant simply based on their gene expression profiles. This is the basis for such a diagnostic tool. 
Furthermore, these genes already allow to predict the diagnoses based on the genetic abnormality of the expression 

20 pattern and to discriminate between different prognostic relevant entities. 

[0008] The development of a leukemia diagnostic tool, preferably microarray based, allows for all patients and spec- 
imen a reproducible, highly specific and rapid method to obtain important information for treatment strategies in leuke- 
mia. This technique can be established in every laboratory using basic methods of molecular biology and does not 
require hematologists or cytogeneticists with several years of experience in leukemia diagnostics. Material for the 

25 analysis can be sent over large distances as it is not necessary that cells arrive viable in the laboratory. Therefore, a 
centralization of leukemia diagnostics with very high quality is possible. 

[0009] Moreover, the accumulation of an immense knowledge about gene expression profiles in leukemia subtypes, 
which are not characterized by specific genetic abnormalities, leads to a more precise classification compared to all 
other methods used so far. In addition, these data are helpful for the understanding of the pathogenesis of leukemia 
30 and will allow to identify genes which are specifically dysregulated. They may be considered as potential targets for 
therapeutic interventions specifically designed for the different leukemia subtypes. 

[0010] Accordingly, the present invention relates to a diagnostic composition comprising at least one nucleic acid 
molecule, preferably . (a) single-stranded nucleic acid molecule(s), which is capable of specifically hybridizing to the 
mRNA of at least one gene listed in Table 1 . 

35 [0011] The use of said nucleic acid molecules for diagnosis of leukemia subtypes, preferably based on microarray 
technology, offers the following advantages: (1) more rapid and more precise diagnosis, (2) easy to use in laboratories 
without specialized experience, (3) abolishes the requirement for analyzing viable cells for chromosome analysis (trans- 
port problem), (4) very experienced hematologists for cytomorphology and cytochemistry, immunophenotyping as well 
as cytogeneticists and molecularbiologists are no longer required, and (5) improves the subclassification of leukemia 

40 due to the definition of new entities based on gene expression profiles in those subtypes that are not clearly defined 
with the methods of the prior art (class discovery). 

[0012] As used herein, the term "capable of specifically hybridizing" has the meaning of hybridization under conven- 
tional hybridization conditions, preferably under stringent conditions as described, for example, in Sambrook et al., 
Molecular Cloning, A Laboratory Manual, 2nd edition (1 989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 

45 NY. Also contemplated are nucleic acid molecules that hybridize at lower stringency hybridization conditions. Changes 
in the stringency of hybridization and signal.detection are primarily accomplished through the manipulation of formamide 
concentration (lower percentages of formamide result in lowered stringency), salt conditions, or temperature. For ex- 
ample, lower stringency conditions include an overnight incubation at 37oC in a solution comprising 6X SSPE (20X 
SSPE = 3M NaCI; 0.2M NaH2P04; 0.02M EDTA, pH 7.4), 0.5% SDS, 30% formamide, 100 aeg/ml salmon sperm 

so blocking DNA, followed by washes at 5O0C with 1 X SSPE, 0.1% SDS. In addition, to achieve even lower stringency, 
washes performed following stringent hybridization can be done at higher salt concentrations (e.g. 5X SSC). Variations 
in the above conditions may be accomplished through the inclusion and/or substitution of alternate blocking reagents 
used to suppress background in hybridization experiments. The inclusion of specific blocking reagents may require 
modification of the hybridization conditions described above, due to problems with compatibility. 

55 [0013] As a hybridization probe (or primer) nucleic acid molecules can be used, for example, that have exactly or 
basically the nucleotide sequence of the genes depicted in Table 1 or parts of these sequences. The term nucleic acid 
molecule as used herein also comprises fragments which are understood to be parts of the nucleic acid molecules that 
are long enough to specifically hybridize to transcripts of the genes of Table 1 . These nucleic acid molecules can be 
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used, for example, as probes or primers In a diagnostic assay. Preferably, the nucleic acid molecules of the present 
invention have a length of at least 13, in particular of at least 20 and particular preferred of at least 25 nucleotides. The 
nucleic acid molecules of the invention or parts therefrom can also be used, for example, as primers for a PCR reaction. 
The fragments used as hybridization probe can be synthetic fragments that were produced by means of conventional 
5 synthesis methods. 

[0014] In a preferred embodiment, the diagnostic composition of the present invention comprises at least nucleic 
acid molecules which are capable of specifically hybridizing to the mRNAs of at least genes listed in Table 1 . 
[0015] In a more preferred embodiment, the diagnostic composition of the present invention comprises at least nu-. 
cleic acid molecules which are capable of specifically hybridizing to the mRNAs of at least genes listed in Table 1 . In 

10 the most preferred embodiment, the diagnostic composition of the present invention comprises at least nucleic acid 
molecules which are capable of specifically hybridizing to the mRNAs of at least all genes listed in Table 1 . 
[0016] In a further preferred embodiment, the nucleic acid molecules of the diagnostic composition of the present 
invention are bound to (a) a solid support, for example, a polystyrene microtiter dish or nitrocellulose membrane or 
glass surface or (b) to non-immobilised particles in solution. 

15 [0017] In an even more preferred embodiment, the nucleic acid molecules of the diagnostic composition are present 
in a microarray format which can be established according to well known methods; for details see .e.g., [http://www. 
affymetrix.com/technology/tech_spotted.html; http: //www.affymetrix.com/technology/tech_probe.html] 
[0018] The present invention also provides the use of (a) nucleic acid molecule(s) of the present invention for the 
preparation of a diagnostic composition for the diagnosis of a leukemia or for the diagnosis of several subtypes or a 

20 disposition to a leukemia. For the diagnosis of a particular leukemia subtype, preferably, at least 5 different nucleic 
acid molecules are used as probes. For diagnosis, preferably, bone marrow or peripheral blood can be used. For 
diagnosis, the target sample is contacted with a (a) nucleic acid moiecule(s) of the present invention and the concen- 
tration of individual mRNAs is compared with the mRNA expression profile levels of a test sample obtained from healthy 
donors. 

25 [0019] The nucleic acid molecule is typically a nucleic acid probe for hybridization or a primer for PCR. The person 
skilled in the art is in a position to design suitable nucleic acids probes based on the information provided in Table 1 . 
[0020] The target cellular component, i.e. mRNA, e.g., in bone marrow or blood (BM), may be detected directly in 
situ, e.g. by in situ hybridization or it may be isolated from other cell components by common methods known to those 
skilled in the art before contacting with a probe. Detection methods include Northern blot analysis, RNase protection, 

30 in situ methods, e.g. in situ hybridization, in vitro amplification methods (PCR, LCR, QRNA replicase or RNA-transcrip- 
tion/amplification (TAS, 3SR), reverse dot blot disclosed in EP-B1 0 237 362))and other detection assays that are 
known to those skilled in the art. Preferably, detection is based on a microarray. 

[0021] Products obtained by in vitro amplification can be detected according to established methods, e.g. by sepa- 
rating the products on agarose gels and by subsequent staining with ethidium bromide. Alternatively, the amplified 
35 products can be detected by using labeled primers for amplification or labeled dNTPs. 

[0022] The probes can be detectably labeled, for example, with a radioisotope, a bioluminescent compound, a chemi- 
luminescent compound, a fluorescent compound, a metal chelate, biotin or an enzyme. 

Brief Description of the Drawings: 

40 

[0023] 





Figure 1a: 


Principal Component Analysis 




Figure 1b: 


Hierarchical Cluster Analysis 


45 


Figure 2: 


Classification Accuracy 




Figures 3a,3b1, 3b2: 


PCA of AML data based on 312 genes 




Figure 4: 


Decision Trees according to l(E) 




Figure 5a: 


Pair-wise Comparison of Normal BM and AML 




Figure 5b: 


Principal Component Analysis 


50 


Figure 5c: 


Hierarchical Cluster Analysis 




Figure 6a: 


Pair-wise Comparison of Normal BM and ALL 




Figure 6b: 


Principal Component Analysis 




Figure 6c: 


Hierarchical Cluster Analysis 




Figure 7a: 


Pair-wise Comparison of Normal BM and CML 


55 


Figure 7b: 


Principal Component Analysis 




Figure 7c: 


Hierarchical Cluster Analysis 




Figure 8a: 


Pair-wise Comparison of Normal BM and CLL 




Figure 8b: 


Principal Component Analysis 
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Figure 8c: 
Figure 9a: 
Figure 9b: 
Figure 9c: 
Figure 10a: 
Figure 10b: 
Figure 10c: 



Hierarchical Cluster Analysis 
AML-WHO Classification 



Principal Component Analysis 
Hierarchical Cluster Analysis 
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Comparison of Normal BM versus Leukemia 



Principal Component Analysis 
Hierarchical Cluster Analysis 



[0024] The following Examples illustrate the invention. 



10 



EXAMPLE 1 
General methods 

15 (A) Selection and characterization of leukemia samples 

[0025]. Bone marrow (BM) aspirates were taken at the time of the initial diagnostic biopsy and remaining material 
was immediately lysed in RLT buffer (Qiagen), frozen and stored at -80 C until preparation for gene expression analysis. 
For microarray analysis the GeneChip System (Affymetrix, Santa Clara, CA, USA) was used. The targets for GeneChip 

20 analysis were prepared according to the current Expression Analysis Technical Manual. Briefly, frozen lysates of the 
leukemia samples were thawed, homogenized (QIAshredder, Qiagen) and total RNA extracted (RNeasy Mini Kit, Qia- 
gen). Normally 1 0 u,g total RNA isolated from 1 x 1 0 7 cells was used as starting material in the subsequent cDNA-Syn- 
thesis using Oligo-dT-T7- Promoter Primer (cDNA synthesis Kit, Roche Molecular Biochemicals). The cDNA was pu- 
rified by phenol-chlorophorm extraction and precipitated with 1 00% Ethanol over night. For detection of the hybridized 

25 target nucleic acid biotin-labeled ribonucleotides were incorporated during the in vitro transcription reaction (Enzo® 
BioArray™ HighYield™ RNA Transcript Labeling Kit, ENZO). After quantification of the purified cRNA (RNeasy Mini 
Kit, Qiagen), 15 ug were fragmented by alkaline treatment (200 mM Tris-acetate, pH 8.2, 500 mM potassium acetate, 
150 mM magnesium acetate) and added to the hybridization cocktail sufficient for 5 hybridizations on standard Gene- 
Chip microarrays. Before expression profiling "Test3 Probe Arrays" (Affymetrix) were choosen for monitoring of the 

30 integrity of the cRNA. Only labeled cRNA-cocktails which showed a ratio of the messured intensity of the 3' to the 5' 
end of the GAPDH gene less than 3.0 were selected for subsequent hybridization on HG-U95Av2 probe arrays (Af- 
fymetrix). Washing and staining of the probe arrays was performed as described by Lipshutz et al., Nat.Genet. 21 (1. 
Suppl.) (1999), 20-4; Lockhart et al., Nat. Biotech nol. 14(13) (1996), 1675-1680). The Affymetrix software (Microarray 
Suite, Version 4.0.1) extracted fluorescence intensities from each element on the arrays as detected by confocal laser 

35 scanning according to the manufacturers recommendations. 

(B) Data analysis 

[0026] Class separation by principal component analysis and hierarchical cluster analysis: In a first step, the dimen- 
40 sionality of the number of genes was reduced. Therefore, the data were scaled from each array to a target intensity 
value 50 (Affymetrix Microarray Suite) in order to be able to perform inter-array comparisons. Then all data was analyzed 
using Significance Analysis of Microarrays (Multiclass Response, Stanford University) (Tusher et a!., PNAS USA 98 
(9), (2001), 5116-5121) and a distinct number of genes based on a permutations test was selected. This reduced set 
of genes which showed to be significant then was analyzed using the Java application J-Express analysis tool (www. 
45 molmine.com), Dysvik and Jonassen, Bioinformatics 17(4) (2001), 369-370). Principal Component Analysis and Hier- 
archical Cluster Analysis (parameters Cluster method: single linkage and Distance metric: euclidean) showed a clear 
separation of analyzed groups of samples e.g. healthy bone marrow versus leukemia. 

(C) Identification of differentially expressed genes according to Golub et al. (Science 286(5439) (1999), 531-537) 

50 i., 

[0027] A previously described method (Golub et al., Science 286(5439) (1999), 531-537) was modified in order to 
reduce the number of candidate genes that could distinguish between our leukemic samples of interest. In a first step 
the raw data was scaled using Affymetrix software (target intensity 50 for all genes). 

[0028] To avoid division by zero or negative numbers as occuring due to the current expression algorithm (Affymetrix) 
55 all average intensities were set of 20 or less to 20. Briefly, for a more detailed gene expression profiling the data analysis 
method according to Golub et al. using weighted voting was applied. In a first step, gene expression levels were log- 
transformed with a cut-off value set at 20 units. To assess the significance of selected genes a leave-one-out cross- 
validation was performed. Only those genes were considered important which were contained in all cross validation 
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classif icators. To determine the association between genes by chance a permutation test (1 00 cycles) was performed. 
Because the number of informative genes, which are able to discriminate between samples, is unknown, the Golub 
method was applied for different numbers of informative genes (range: 10-200). The minimal set of genes which pro- 
vided optimal classification accuracy was selected to avoid overfitting. 

5 

Example 2 

Identification of genes, the aberrant expression of which is associated with a particular leukemia subtype 

10 [0029] Monitoring the gene expression level of thousands of mRNA transcripts simultaneously in one experiment is 
the key technology to find out the specific genes which allow the subsequent development of a class prediction model. 
Thus, the Affymetrix (Santa Clara, USA) oligonucleotide microarray technology (GeneChip© Instrument System) was 
used to obtain gene expression profiles of each individual clinical sample of interest. The HG-U95Av2 probe arrays 
gave information about the relative mRNA abundance of about 12,000 full length human genes which are represented 

15 on these high-density oligonucleotide microarrays. 

[0030] In total, 8 bone marrow samples of healthy volunteers and leukemia patients were investigated. Five different 
types of bioinformatic calculations were performed: 

(I) Three distinct genetic subtypes of AML 

20 

[0031] Three defined cytogenetic aberrations t(8;21)(q22;q22) (n=9), t(15;1 7) (q22;q12) (n=18) and M4eo with inv 
(1 6) (p1 3q22) (n=1 0) were selected corresponding to the 4 FAB-subtypes AML M2, M3 or M3v and M4eo, respectively. 
After bone marrow aspirates from 37 untreated patients with newly diagnosed AML were obtained, all cases were 
characterized by cytomorphology, cytogenetics and by molecular genetics. AML subtypes M3 and M3v both carry the 

25 same chromosomal aberration but differ in morphological aspects like nuclear configuration, granulation and clinical 
aspects like white blood cell count (WBC), respectively. In all cases, these balanced abnormalities were confirmed by 
fluorescence in-situ hybridization. The corresponding fusion transcript was detected by RT-PCR and/or quantitative 
real time PCR. The median age of the patients was 53 years (range, 19-82 years) and did not differ between the 
respective groups. The median WBC count was 17.0 G/l (range, 0.8-168.0 G/l) and was strikingly lower in patients 

30 with AML M3 as compared to all other patients. 

Methods used: 

(A) Selection and characterisation of leukemia samples 

35 

[0032] Bone marrow (BM) aspirates from 37 AML patients standing for four morphological and three underlying 
cytogenetic subgroups were obtained that were sent to the Laboratory of Leukemia Diagnostics (LFL) for central di- 
agnosis (Klinikum Grosshadern, Munich, Germany). They were selected for this study on the basis of several criteria. 
It was mandatory that none of the patients had been treated. All samples, exclusively newly diagnosed in our laboratory, 
40 had to be well characterized as de novo AML and diagnosis had been proven by cytomorphology, cytogenetics, flow 
cytometry and molecutar genetics in every single case. All samples for gene expression analysis were taken at the 
time of the initial diagnostic biopsy when remaining material was immediately lysed in RLT buffer (Qiagen), frozen and 
stored at -80° C until preparation for gene expression analysis. 

45 (B) Microarray experiments 

[0033] For microarray analysis the GeneChip System (Affymetrix, Santa Clara.CA, USA) was used. The targets for 
GeneChip analysis were prepared according to the current Expression Analysis Technical Manual. Briefly, frozen 
lysates of the leukemia samples were thawed, homogenized(QIAshredder, Qiagen) and total RNA extracted (RNeasy 

50 Mini Kit, Qiagen). Normally 1 0 u,g total RNA isolated from 1 x 1 0 7 cells was used as starting material in the subsequent 
cDNA-Synthesis using Oligo-dT-T7-Promotor Primer (cDNA synthesis Kit, Roche Molecular Biochemicals). ThecDNA 
was purified by phenol-chlorophorm extraction and precipitated with 100% Ethanol over night. For detection of the 
hybridized target nucleic acid biotin-labeled ribonucleotides were incorporated during the in vitro transcription reaction 
(Enzo® BioArray™ HighYield™ RNA Transcript Labeling Kit, ENZO). After quantification of the purified cRNA (RNeasy 

55 Mini Kit, Qiagen), 15 u.g were fragmented by alkaline treatment (200 mM Tris-acetate, pH 8.2, 500 mM potassium 
acetate, 150 mM magnesium acetate) and added to the hybridization cocktail sufficient for 5 hybridizations on standard 
GeneChip microarrays. Before expression profiling 'Test3 Probe Arrays" (Affymetrix) were choosen for monitoring of 
the integrity of the cRNA. Only labeled cRNA-cocktails which showed a ratio of the messured intensity of the 3' to the 
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5' end of the GAPDH gene less than 3 were selected for hybridization on HG-U95Av2 probe arrays (Affymetrix). Washing 
and staining the probe arrays was performed as described. The Affymetrix software (Microarray Suite, Version 4.0.1 ) 
extracted fluorescence intensities from each element on the arrays as detected by confocal laser scanning according 
to the manufacturers recommendations. 

5 

(C) Class separation by principal component analysis and hierarchical cluster analysis 

[0034] In a first step, the dimensionality of the number of genes was reduced. Therefore, the data from each array 
were scaled to a target intensity value 50 (Affymetrix Microarray Suite) in order to be able to perform inter-array com- 
10 parisons. Then all data was analyzed using Significance Analysis of Microarrays (Muiticlass Response, Stanford Uni- 
versity) and 580 genes were selecetd based on a permutations test. This reduced set of genes which showed to be 
significant then was analyzed using the Java application J-Express analysis tool. Principal Component Analysis and 
Hierarchical Cluster Analysis (parameters Cluster method: single linkage and Distance metric: euclidean) showed a 
clear separation of analyzed groups of samples e.g. healthy bone marrow versus leukemia. 

15 

(D) Identification of differentially expressed genes according to Golub 

[0035] This analysis was carried out as described in Example 1 (C), above. Briefly, classification of tumor samples 
was achieved by using a set of samples whose class had been already determined. This set was called training set. 
20 By using the oligonucleotide microarrays (Lockhart etal., Nat. Biotechnol. 14 (1996), 1675-1680), the transcript levels 
in training set samples were measured for those genes that were represented on the microarray. The values for "tran- 
scription strength" were determined by averaging the values of a set of probes which were compared to a set of nearly 
identical probes containing a single mismatch. This was performed by using methods provided by the oligonucleotide 
array of Affymetrix Inc. 

25 

(E) Principle Components Analysis, Classifier and Decisions Trees 

[0036] In order to obtain comparable values between different samples, they had to be standardised first. The method 
followed that described in BeiBbarth et al. (Nat. Biotechnol. 14 (1 996), 1 675-1 680), except that correcting for (additive) 

30 background had been omitted. In brief, the data from one of the samples were declared to serve as a "standard", and 
the values from all other samples were adapted to this standard. For every possible comparison to this standard, a set 
of "reliable" values was determined by calculating the correlation coefficient for a series of intervals of increasing length. 
The lower bound of reliability was the bound of the interval that had a correlation coefficient less than or equal to the 
smaller intervals. From all reliable values, a (logarithmised) correction factor was calculated by computing the median 

35 of the differences of the logarithmic values. Values that were zero or negative prior to taking the logarithm were not 
taken into account. 

[0037] The obtained data matrix contained values from one sample per column. The gene expression profile across 
all samples for one gene or gene fragment represented on the oligonucleotide microarray was contained in a row of 
the matrix. To allow for rapid calculation of the classifier and to reduce memory usage, certain genes were pre-seiected 
40 from the set of all genes represented on the array. The following criteria were applied: 

£k o a) 

45 



50 -=4 >t (2) 

55 m refers to the average of the i-th class (/=!,. ..,k), u, to the total average, Oj to the standard deviation of the i-th class 
and rto an arbitrary treshold £ 1 . Selection by these methods resulted typically in a reduction in the number of genes 
by afactor of 1 0-30. To check the quality of the selection procedure, the first two principal components (Jolliffe, Principle 
Components Analysis (1 986), Springer (New York)) for the samples were plotted. This allowed to judge whether or not 
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a rigorous discrimination was possible between the different classes. 

[0038] For construction of the classifier, decision trees (Breiman et al. f Classification and regression trees, Wadsworth 
& Brooks/Cole (Monterey)) were used. Simple decision trees that discriminate between n classes by using only tran- 
scription levels for (n-1) genes were used. They were trained and the selected genes were the discarded from the 

5 original data set. A new tree was constructed by using the truncated data set and the entire procedure was iterated 
until a predetermined number of trees was reached. The optimal number of trees could be estimated by counting the 
number of misclassifications of classifiers built from different numbers of trees. For this, an independent data set of 
cross-validation had to be used. The final vote of the multi-classifier was obtained by applying a vote-by-majority rule 
to the predictions of the contained trees. In the example of the present invention 1 5 decision trees had been used for 

10 the multi-classifier. This allowed perfect classification of 100% of the samples, discriminating between classes that 
were given by chromosomal aberrations. To estimate generalisation properties, i.e. how accurate the classifier may 
perform on samples that have not been used for training, cross-validation had been used (Efron and Tibshirani, An 
introduction to the bootstrap (1993), Chapman & Hall (New York, London), pp. 237-247). 

15 Results (Golub Method) 

[0039] From this point of view it was found that a set of 1 7 genes was sufficient to distinguish distinct AML subtypes 
from each other with high precision (Tables 1 ). The classification model was able to identify the 4 morphologically and 
3 cytogenetically and molecular biological different subtypes AML with t(8;21), with t(15;17),and with inv(16) (Figures 
20 la-b,2). 

[0040] In conclusion, by comparison of gene expression profiles of AML samples (3 tested genetic subtypes t(8;21 ), 
t(15;17) and inv(16)) genes could be identified which allowed a differentiation between each individual AML subtype 
in detail. It could be shown for the first time that these distinct abnormalities on the genomic level relate to a specific 
gene expression pattern. In other words, in the experimental setting the knowledge of the expression status of these 
25 designated genes was sufficient to predict the genetic abnormality and allows the diagnosis of specific genetically 
defined subtypes of AML (Table 1). 

[0041] Results of methods described in l(E) are shown in Table 2 and Figures 3a+b, 1/2 and 4. 
(II) Pair-wise comparisons between normal bone marrow, AML, ALL, CML, and CLL 

30 

[0042] By pair-wise comparisons gene expression profiles of 8 cases of normal bone marrow, 48 AML, 9 ALL, 8 
CML, and 7 CLL were evaluated. These led to the identification of subtype-specific genes (Tables 3-12; Figures 5a-c, 
6a-c, 7a-c, 8a-c). 

35 (III) AML classified according to WHO proposal 

[0043] To allow classification of AML subtypes according to the new WHO proposal the gene expression profiles of 
four genetically defined AML subtypes (t(8;21) n= 9; t(15;17) n= 18; inv(16) n= 10; 11q23/MLL aberrations n= 11) was 
used. This led to the identification of subtype-specific genes (Table 13, Figures 9a-c) . 

40 

(IV) Normal bone marrow versus distinct genetic subtypes of AML 

[0044] The gene expression profiles of normal bone marrow (n= 8) and of four genetically defined AML subtypes (t 
(8;21) n= 9; t(15;17) n= 18; inv(16) n= 10; 11q23/MLL aberrations n= 10) was used. This led to the identification of 
45 genes that allow the distinction between normal bone marrow and each of the four AML subtypes (Table 14). 

(V) Identification of genes specifically separating normal bone marrow, AML, ALL, CML, and CLL 

[0045] The gene expression profiles of normal bone marrow (n= 8) and of AML (n= 48), ALL (n= 9), CML (n= 8), and 
50 CLL (n= 7) was used. This led to the identification of genes that allow the distinction between normal bone marrow 
and each of the four leukemia subtypes (Table 1 5, Figures 1 0a-c). 



Claims 

55 

1 . A diagnostic composition comprising at least one nucleic acid molecule which is capable of specifically hybridizing 
to the mRNA of at least one gene listed in Table 1-15. 
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2. The diagnostic composition of claim 1 comprising at least 20 nucleic acid molecules which are capable of specif- 
ically hybridizing to the mRNAs of at least 20 genes listed in Table 1-15. 

3. The diagnostic composition of claim 2 comprising at least 40 nucleic acid molecules which are capable of specif- 
5 ically hybridizing to the mRNAs of at least 40 genes listed in Table 1-15. 

4. The diagnostic composition of any one of claims 1 to 3, wherein the nucleic acid molecules are bound to a solid 
support or to non-immobilised particles in solution. . 

10 5. The diagnostic composition of claim 4, wherein the solid support is a microarray. 

6. The diagnostic composition of any one of claims 1 to 5, wherein the nucleic acid molecule has a length of at least 
13 nucleotides. 

15 7. Use of (a) nucleic acid molecule(s) as defined in any one of claims 1 to 6 for the preparation of a diagnostic 
composition for the diagnosis of a leukemia or a disposition to a leukemia. 

8. Use according to claim 7, wherein the leukemia is acute myeloid leukemia. 

20 9. Use of (a) nucleic acid molecule(s) as defined in any one of claims 1 to 6 for the preparation of a diagnostic 
composition for the diagnosis of a leukemia-subtype. 

10. Use according to claim 9, wherein the diagnosis is AML, CML, ALL or CLL. 

25 11. Use according to claim 10, wherein the FAB-subtypes are AML M2, AML M3, AML M3v, AML M4eo and/or AML 
with 11q23/MLL aberations. 
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Figure 4: Decision trees according to IE 
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Invention 1: 

Diagnostic composition comprising at least one nucleic acid 
molecule which is capable of specifically hybridizing to the 
mRNA of 6AR22 and the use of such a nucleic acid for the 
preparation of a diagnostic composition for the diagnosis of 
leukemia. 



2. Claims: 1-11 (all partially) 
Inventions 2-1700: 

Diagnostic composition comprising at least one nucleic acid 
molecule which is capable of specifically hybridizing to the 
mRNA of HLA-DPA1 and the use of such a nucleic acid for the 
preparation of a diagnostic composition for the diagnosis of 
leukemia. 

..ibidem for each gene listed in table 1-15 
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